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ABSTRACT 

Large organizations have a multitude of data sources across the 
enterprise and wan" to obtain bus i > 1 
While the majority of these data sources may be consolidated in 
an enterprise data warehouse, many business units have their own 
data marts where analysis is carried out against data stored in 
multidimensional data structures. It is often critical to pose 
queries which span both these sources. This is a challenge since 
these sources i i dels and query languages (SQL vs 

MDX). The Siebel Analytics Server enables this requirement to 
be fulfilled. In this paper, we describe how the multicim i 1 1 i 
metadata is mo iele 1 reli ti mail) within Siebel Analytics, efficient 
SQL to MDX !■ in at , 1. rithms and the conversion protocols 
required to convert a multidimensional result into a relational 
rowset. 

1. INTRODUCTION 

Most companies have made significant investments in data 
warehousing technologies and business intelligence tools to 
maximize the value of their enterprise wide information. While 
the majority of the data may be loaded and transformed into a 
centra! data warehouse, there still remain other data sources in the 
organization which are extremely valuable but are not in the 
warehouse. 

The ability to query multiple federated data sources is key for 
answering critical business queries. For example, finance 
departments typically use multidimensional databases for 
budgeting that allow them to carry out sophisticated calculations 
thai cannot be performed in a relational database. A common 
query compares budgets versus actuals where the actuals are 
stored in a data warehouse. Another example is real time 
reporting where a single report may need up to the minute 
headcouni data from the HR system along with the latest 
information or. »h» -jkv pipeline i i Si office systems. 
Permission n> make digital or hard copies; of all or part of this work for 
personal or classroom use is gtsmtcd without fee provided trial copies sic 
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The problem of federation has been addressed extensively by the 
database community both in general and with regard to support 
for specific data sources [1, 5, 6], In this paper, we address the 
problem of Integrating multidimensional data sources in a 
federated system. Multidimensional data sources typically have 
custom access methods with their own proprietary query 
languages and API's. Most of the vendors in this segment now 
support the XMLA standard [4], an open industry-standard web 
services interface designed for online analytical processing 
(OLAP). The existence of a common standard supported by a 
large number of vendors has made it viable to invest resources in 
devising algorithms and techniques for federating this class of 
data sources. 

Federating multidimensional systems is challenging. These 
systems have a rich set of metadata that includes dimensions with 
multiple hierarchies and measures with aggregation rules. This 
rich metadata does not have any direct analog in the relational 
world but still has to be used to pose meaningful queries against 
these data sources. There exists a powerful query language MDX 
which has a number of specialized operators over this metadata 
and returns results in the form of a multidimensional dataset 
rather than a rowset. In this paper, we describe how these issues 
are addressed in the context of the Siebel Analytics Server, a 
federated relational system. 

The structure of the paper is as described below. In Section 2, we 
discuss the architecture of the Analytics Server with an emphasis 
on query' processing and data modeling. We define the scope of 
the multidimensional data source federation problem. In Section 
3, we provide an overview of multidimensional data sources, the 
MDX query language and the XMLA protocol. In Section 4, we 
describe how multidimensional metadata is i 
Analytics. In Section 5, we address how to generate MDX queries 
from combination of an internal query plan representation and 
tii tridimensional data source specific metadata. We describe how 
the multidimensional result set is converted back into rowsets. 
Section 6 summarises the contributions of this paper and 
discusses areas of future work. 



2. ARCHITECTURE OF THE SIEBEL 
ANALYTICS SERVER 

The Siebel Analytics Server is an ANSI SQL compliant query 
server at the core of Siebei's Analytics/Business Intelligence 
Suite. The Siebel Analytics Suite has a web based front end 
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' 3 he Analytics 

Server via ihe ODBC protocol. These querit u ised aga 
logical tables, a concept we describe below. The logical queries 
are translated into the appropriate queries against physical back 
end databases. The results from ihe backend databases are 
combined and further manipulated before being returned to the 
-a national 

system with the capability of query execution but without a 
storage subsystem. 

The SQL que s ^ r ver are posed against a 

logical business model. A business mode! presents business 
information in a manner that parallels business analysts' 
understanding ot the business structure rather than the physical 
structure in which the data may be stored. The business model 
contains dimen < irchi sasui a id other data 

warehousing concepts. Note that unlike multidimensional data 
sources, the >r j ial ihese objects serve as 

an aid to modeling. 

A key concept required to support business models is the logical 
table. A logical table consisting of one or more logical columns is 
an abstraction above a physical table. It can be a subset of a 
physical table (a subset of columns or a subset of rows); it can 
combine the contents of two or more physical tables or it can be 
derived from other logical tables. Typically, logical tables reduce 
complexity in the information model because a single logical 
table can map to multiple physical tables, Similarly, a logical 
column is an abstraction i in It can be 

mapped to one or more physical columns, to a scalar expression 
involving physical columns or to other logical columns. 
Logical tables and columns can be mapped to multiple sources. 
These sources can originate from multiple databases of the same 
or different types. This federation capability supports horizontal 
and vertical fragmentation across data sources. The ability to 
specify alternative data sources allows users to model replicated 
data sources as well as support aggregation navigation by 
specifying alternative sources at different levels of granularity. 
The relationships between logical tables are specified in terms of 
joins. These joins may be foreign key joins based on logical keys 
specified for each logical table, or outer joins. The repository has 
metadata pertaining to the physical data sources being mapped. 
This includes connectivity information, information about the 
type of the data source and a features table describing ihe 
capabilities of the data source. 

During query processing the Analytics Server takes a SQL query' 
as input, parses it and converts it into internal data structures. 
Based on the metadata information, it deduces which physical 
tables correspond to the iogica! tables referred to in ihe query and 
produces an initial query plan in terms of these tables. All 
aggregate navigation is carried out at this point. This; initial query 
plan is then optimized by a rule based query compiler. The query 
compiler carries out optimizations like pushing aggregations 
>ciow joins i ired to convert the plan 

to a state where we can generate SQL to ship to one or more 
backend databases. This includes determining which operations 
shouid be executed remotely and which operations should be 

xecuted withi! 1 s ve le goal is to ex e 3 

ch ctionaHty as p< e at \ end database This has 

i 1 ' porting as small a set of daia from 

the back end database(s) io the Analytics Sen/er which is 



typically a dominant cost in qaer j si j Post processing 
steps are carried out on the results of ihe query and the results are 
then shipped back to the client. 

We now examine the areas impacted by the requirement to 
support multidimensional data sources. Multidimensional data 
sources are modeled in the physical layer of the metadata 
repository as a new database type and are mapped io the business 
model layer in an identical fashion to relational databases. Since 
users construct queries 3 t 

data sources are completely transparent to them. The joins 
between multidimensional data sources and relational data 
sources are specified in the business mode! layer. 
Supporting muitidimen onal data jources requires specialized 
metadata for hierarchies and levels to enable us to generate fviDX 
rather ihan SQL. Since the structure of a query plan is relational, 
in es snce 01 iHtdimen r ^.ode generation module 5QL 
to MDX translator. Our design goal is to ship as broad a class of 
queries as possible -- the ability to push down GROUP BY's 
being especially important. This satisfies the goal of this project 
which is the ability to efficiently combine data from relaiional and 
multidimensional data sources 



3. MULTIDIMENSIONAL DATA 
SOURCES, MDX AND THE XMLA 
PROTOCOL 

Multidimensional data sources typically present the user with a 
dimensional view of data. The data is typically organized around 
two key 1 oncep dimensions me isures. Each dimension can 
have one or more hierarchies. For example the Time dimension 
may have a Year-> Quaner-> Monlh->Day hierarchy and/or a 
Year->Week hierarchy. Measures are metrics that are of business 
interest like profit or sales. Multidimensional data sources are 
organized as cubes which consist of a collection of dimensions 
and measures. Many useful business questions are answered by 
obtaining the value of one or more measures for selected 
combinations of dimensional values. This form of data modeling 
is typical in data warehousing scenarios [2]. 
Multidimensional data providers often have (heir own storage 
subsystems which are optimized for efficient access of 
^i.iiMii.- ! 1 i ■ \ ige subsystems use extensive pre- 
calculation of aggregates at various levels to provide efficient 
performance Many vendor products use the term OLAP (Online 
Analytical Processing) to describe this space. The terms MOLAP 
and ROLAP are used to distinguish between ihe storage 
met htwsm used: ROLAP products use relational storage while 
MOLAP products use multidimensional data structures. HOLAP 
systems combine aspects of both, 

):;e ban 1 t icnsiona! data providers 

has been the absence of a standard interface for access. The 
XMLA standard [4] addresses this gap. It has two methods: 
DISCOVER and EXECUTE. The DISCOVER method is used to 
obtain information about the metadata. It returns a lis! of cubes 
available for querying along with details of their dimensions, 
hierarchies, levels and measures in an XML format. The 
DISCOVER call is invoked on a URL. The EXECUTE, method is 
used to send queries for execution io an XMLA data source. This 
consists Of sending an MDX query for execution. 
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The MDX language is specialized for querying multidimensional 
data. MDX has rich functionality for manipulating members and 
sets of members of a dimension by means of functions like 
Ancestors?) and bet><.<.r« 

Series calculations exist. Some business calculations require 
t n different d mensions; 

MDX has the concept of a solve order for dealing with 
1 A i ?vT0> c >i t r,. 

described in detail in [3], we will focus on the subset of MDX thai 
the Analytics Server generates. 

A kev d. f reii t id a SQL c 1 i 

MDX query returns a multidimensional grid of cells as output to a 
query. The number of dimensions in this grid depends upon the 
number of axes specified in the query. Each axis of a query is 
composed of a set of tuples each of which can have one or more 
dimensions. A zero axis query will still return a single cell as 
output to the query. The basic structure of a MDX query is as 
shown below. 

SELECT [<axis_specification> 

[, <axis_specification>.. .J] 

FROM [<cube specification^ 

[WHERE [<slicer_specification>] j 
In the tempiate above, the cube specification is the name of the 
cube being queried. This is a single cube name and unlike SQL 
there is no syntax for specifying joins. The WHERE clause is 
optional and is referred to as the sheer axis. The MDX WHERE 
clause is semantically distinct from its usage in SQL; it serves the 
purpose of restricting dimensions not explicitly specified in the 
SELECT clause to values specified in a tuple. We make this 
clearer by means of an example. 

SELECT 

{ [Measures] . [Profit] } ON COLUMNS, 
i [Time] . [2002] , [Time] . [2003] } ON ROWS 
FROM Sales 

WHERE ( [Store] . [USA] . [California] ) 

This query requests Profit for stores in California for the years 
2002 and 2003. The output is a grid with two cells, one 
containing the profit for 2002 and the other for 2003, The sheer 
axis ensures that the profit is calculated only for California. We 
have not explicit! specific m. ic < -man rule, this ts defined 
in the cube metadata. Note that there is syntax for explicitly 
referring to dimensional members. While in this query there is a 
two dimensional grid as output, a greater number of axes can be 
specified. 



4. MODELING MULTIDIMENSIONAL 
DATA SOURCES IN SIEBEL ANALYTICS 
METADATA 

We now describe how to model multidimensional data sources in 
Siebel Analytics. We map a multidi i a source to a 

cubetable in the physical level of the metadata repository. This 
can be carried ju! autom v the import functionality 

of the metadat J* 1 r te vh eh obtains the relevant 
metadata from the specified XMLA provider. The import 



functionally uses the DISCOVER cail of the XMLA protocol to 
map the met , i i ves 

described below. A cubetable is a special version of a relational 
table augmented with special metadata that allows us to generate 
MDX queries against the datasource. In alt other respects it 
behaves identically to a relational table. Cubetables are mapped to 
the logical layer of the repository in tire same manner as relational 
tables. During query processing, MDX will be generated against 
cubetables rather than SQL. 

Each cubetable is based on a single multidimensional cube, A 
cubetable consists of cube columns. A cube column behaves in 
the same function as a regular column but is annotated with 
special metadata (hierarchy and level information) used only for 
MDX code generation purposes. A cube column can either be a 
measure or a dimensional column. Cube Columns which are 
measures are annotated with an aggregation ruie (For example, 
sales may be associated with aggregation rule SUM. This 
information is normally available for non-derived aggregation 
rules via the XMLA protocol). Dimensional columns are either 
level keys or property 1 k oa level of a 

hierarchy while a property is functionally dependent upon a level 
key. We illustrate this by means of an example. 
Consider a cube with two hierarchies: Time: Year-> Quarter -> 
Month and Geography: State -> City -> Store Name. Store Name 
has the property Store Manager. We have two measures Profit and 
Sales which both have aggregation rule SUM associated with 
them We w ish to map this cube into a cube table T. 
The resultant cube table T has the following cube columns (Year, 
Quarter, Month, State, City, Store Name, Manager, Profit, Sales). 
Profit and Sales are labeled as measures and have the aggregation 
rule SUM associated with them. The cube table T has two 
hierarchies Time and Geography associated with it. Time has the 
following levels: Year, Quarter and Month. Each of these levels 
has a levei key of the same name associated with it. The 
Geography hierarchy has 3 levels: State, City and Store Name, 
Each level has a level key of the same name associated with it. 
Additionally, the Store Name level has Store Manager associated 
with it as a property. Cube Table T has a primary key which is the 
set of all cube columns which are also level keys. In this example 
this consists of the set (Year, Quarter, Month, State, City, Store 
Name) Both measures are functionally dependent upon the key. 
We have some restrictions on how cubes can be mapped to 
cubetables, A single cubetable can contain only one hicraichy per 
dimension. If a user needs to map additional hierarchies, an 
additional cubetable will have to be created. Additionally, we 

sii 1 1 1 - i r„r } ; a horn i hierarchy 

This implies thai every node of every level in the hierarchy is of 
the same type and every branch of the hierarchy has the same 
number of levels This framework does not model parent child 

lierarehies which are self e number of 

levels is not explicitly specified up front A typical example of 
t sl i f i es s ' i i J e p > nships 
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5. MDX CODE GENERATION AND 
QUERY PROCESSING 

We now examine the steps taken to process a query. We then 
outline the issues specific to queries that reference a 
multidimensional data source and describe how they are bandied. 

1 . The user poses a SQL query against the logical mode! 
built in the Siebe! Analytics metadata. This is 
converted into a logical query representation. 

2. The navigator transforms the logical query into a plan 
comprised solely of objects from the physical backend 
data sources. 

3. The i z les the query a 
the completion of this phase the query plan is ready for 
execution. Note thai this plan is purely relational in 
nature, 

4. The execution plan may reference multiple physical 
tables. The plan is marked appropriately with 
indications of where each operation will be executed. 

' i ibase joins and other post processing 
operations are marked for execution in the middle tier. 

5. In a relational setting, the data corresponding to the 
portion of a query plan marked for remote execution 
can be obtained by converting the required portion of 
the plan into a SQL query. This conversion is referred 
to as code generation and results in the appropriate 
vendor dialect of SQL being produced. 

6. After execution of the SQL queries, data is fetched to 
the Analytics Server. Post processing is carried out and 
the resulting rowset is returned to the eiicnt 

The overall flow of query processing requires modification when 
a backend data source is multidimensional. In step 5 we produce a 
SQL query from a relational query plan fragment. For 
multidimensional data sources MDX rather than SQL is 
generated. This problem is referred to as the MDX Code 
Generation problem tor the remainder of tins paper. In step 6, we 
refer to post processing carried out in the server on rowsets. 
However, the response to a MDX query is a multidimensional 
data set (in XML) and not a rowset. An unpivoting protocol is 
required which specifies how to convert from a multidimensional 
data set to a rowset. We first examine a simple unpivoting 
protocol in detail before moving to the broader problem of MDX 
Code Generation. 

We describe a simple protocol used for Microsoft Analysis 
Sen/er, We restrict the class of MDX queries to two dimensions. 
This implies that we will have a two dimensional grid of result 
cells. The XML output file consists of a set of tuples on columns, 
a set of tuples on rows and the two dimensional result set of cells 
delimited by these rows and columns. Each cell has a unique 
index. This index is determined by a formula specified in the 
XMLA protocol, We construct a rowset from this XML file as 
follows. All tuples specified on columns are ignored. In the class 
of MDX queries we intend to generate the columns tuples will be 
containing the names of the measures in the query and will not 
contain any data required for the resulting rowset We construct 
each row by a cells from that 

row. This is illustrated in the example below. 



Example: We describe how an intermediate rowset is constructed 

for the MDX query below. 

SELECT 

{ [Measures] . [Sales] 1 on COLUMNS, 

! [Products 
FROM [Sales] 

The XML output file will contain tf iten 1 ■ esj. [Sales] 
as the column tuple. Let the tuples returned on tows be (1998, 
Pepsi), (1998,Coke), (1997,Pepsi), (1997,Coke). We construct a 
rowset by appending the appropriate cell values to each row tuple. 
T tethc (oj i 5 r i oris paragraph enables us 

to transform an XML output file into a rowset. However, the 
MDX query generated may not have an exact correspondence 
with the execution plan. This could be foi a number of reasons: 
(1) All MDX queries return a measure even for dimensional only 
queries where the default measure is returned. This additional 
column in the obtained rowset may need to be pruned. (2) We 
may require a different ordering of columns in the project list. In 
the scheme specified earlier, all measures columns will be in 

il members of the row tuples. 
Correct ordering of columns can be ensured by using a simple 
protocol consisting of a list of integers. The list (2, 1 ,4) implies 
that the mappings between the intermediate rowset and the final 
rowset are as follows. Intermediate Column(IC) 2 maps to Final 
Column(FC) 1, IC 1 maps to FC2 and IC 4 maps to FC 3. Note 
that not every column in the intermediate rowset has to map to a 
column in the final rowset, Every MDX query generated would 
have a corresponding protocol list generated. Different data 
sources may have different mapping protocols. The more complex 
this protocol, the greater is the overhead of middle tier processing. 

5.1 MDX Code Generation 

MDX code generation aims to generate a query based on the 
portion of the query execution plan marked for execution at the 
multidimensional database. The query plan is relational, in 
essence we are trying to translate SQL to MDX. We translate a 
subset of SQL corresponding to the template below. 
SELECT cl, c2.„, aggr(mi), aggr(m2) 

FROM Table 

GROUP BY cl, c2.„. 

This template is typical of queries where users are interested in 
obtaining metrics at a specified level of aggregation. We support 
other query templates but for ease of exposition we restrict 
ourselves to this primary use case. We examine various code 
generation strategies using queries against the table in the 
example given t v. I i i ts a ke> facet oi MDX 
code generation. We then present the MDX code generation 
algorithm in its entirety, 



Consider a cube, Sales, with two hier archies. Time (Levels: Year, 
Quarter) and Store (Levels: Store Country, Store State) and one 
measure L i ( i 

cubetable T(Store Country, Store State, Year, Quarter, Unit 
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Sales). Lei the aggregation rule associated with Unit Sales be 
SUM. 

Query i [Multiple dimensions plus measure with matching 
aggregate rule] 



set [A] as 'Descendants i [Store: , (Store State])' 
select 

{ (Measures] . [CountryAncj , [Unit Sales! 1 on 
columns, 

! [A]) on rows 
From [Sales] 



Select Store Country, Year. SUMfUtvt Sales) 
Group By Store Country, Year 



This query requires us to crossjoin two dimensions to obtain the 
tuples to be placed on rows. We can directly use the member 
[Unit Sales] because the aggregation rule in the query matches 



Query 2 [Measure with non-matching aggregate rule] 



Select Store Country, Year, A VGfUnit Sales) 
From T 

Group By Store Country, Year 



, [Stc 



;sio. 



o Country] . members i ' 
] .members) ' 

cycrossjoinU [A] i, ( [B])> ' 
[MSI] as 

Descendants (Store. cur: 



Descendants (Time. currentmember, (Quarter!) !, [Unit 
Sales] ) ' 
select 

( [MSI] ) on columns, 
( [C] } on rows 
from [Sales] 

For each combination of country and year we need to find the 
average sales. The measure needs to be explicitly computed 
because AVG differs from the aggregation rule associated with 
[Unit Sales] which is SUM. This requires us to model the 
measure as a calculated measure [MS ! ], where each value at the 
gTain of Store state and Quarter is examined to compute the 
average sale. We are making use of the MDX WITH section to 
define named sets and i i ures. Sets A and B arc- 
named sets which are used to store the dimensional members 
while set C consists of the tuples obtained by crossjoining A and 
So obtain all possible combinations of countries and years. 

Query 3 [Single dimension with multiple levels plus measure] 

Select Siore Country, Store State, SUM(U<.>.>,< Sates) 
From T 

Group By Store Country, Store Stale 



In this query we require multiple levels of the same dimension. In 
the MDX language a tuple can only contain one member from a 
single dimension. We handle this by modeling Store Country, the 
level closer to the toot, as a calculated member using the Ancestor 
function. If we changed the order of country and state in the 
SELECT list we would still generate the same MDX query. This 
is an example of a query where the post processing protocol 
ensures that two different queries result in different answers. 

>w> s l i measure with matching 

aggregate i < him 

Select Store Country. Year, SUMfUnit Sales) 

Where Store Country in (VSA ', ■India') AND Year = '1997' 
Group By Store Country, Year 

Alternative 1 
with 

set [A] as '[[Store Country] .members ) ' 

set [B] as ' ( [Year] .members ) ' 

set [C] as ' nonemptycross join ( ( [A] ) , I [B] ) ) ' 

{ [Unit Sales] I or, columns, 

Filter! ( [C] }, (Store . currentmember . name = "USA" 
OR Store, currentmember. name = "India") AND 

time. currentmember. name = "1997") on rows 
from [Sales] 

Alternative 2 

wi th 

set [A] as '{ filter (i .Store Country! .members. 

Store . currentmember . name ~ "India") )' 

set [B] as '( filter ( [Year] .members, 
time. currentmember. name = "1997") )' 
set [C] as 1 nonemptycross join ( 1 [A] 1 , ( (B] ) ) 1 

{[Unit Sales]) or. columns, 
[ [C] (on rows 
from [Sales] 

This query illustrates how the WHERE clause is processed. In the 
first MDX query the WHERE clause is executed after the 
crossjoin. In the second query we break the WHERE clause into 
constituents which are applied on each dimension before the 
crossjoin. The second query is more efficient since a crossjoin is a 
very expensive operation and its input sets are smaller in this 
case. Relational systems typically have optimizers that are 
responsible for optimizations like push down of predicates. 
Optimizers for suitidimensiona! data sources are less mature and 
hence it is critical to generate the second query. This optimization 
cannot be carried out for ail classes of predicates. For example, if 
we substitute the AND in the query with an OR the disjunctive 
predicate ws!' h : ■ !uat«i 3fH he crossjoin 



Queo - [M > 

with matching aggregate rale, preris i! t > mp-ojeeted and 

projected levels] 



I, i ston ' 
From T 

Where Store Country = 'USA ' AND Store State = 'CA ' AND 

quarter IN ( 'Q3 ', Q4 '} 

Group By Store Country, Store State. Year 



member [Measures ]. [CountryAnc] as 

in t t ,i • t-r [Store 

Country)) .name' 

set [A] as -filter! {[Store Country 

] .members I , Stor c e "USA")' 

set[B) as' Filter ( Generate (([A]}, Descendants 

( : Store; . eurrentmember, i St ore j . : Store State])), 

[Store) .curteotroeuibor.name- "CA" )' 

set [C] as ' i [Tirael , [Year: .members) ' 

set [Dj as 'nonernptycross join i i [B] 1 , { [C] > ) ' 

member [measures J . (MSI ] as 

'SUM(filter(Desc > i 

Time. currentmember. name = "Q4"),[Unit Sales])' 
select 

( [Measures] . [CountryAnc] , 

[Measures] . [MSI] ) on columns, 
([D]) on rows 
From [Sales] 

In this query, we have predicates on multiple levels of a 
dimension. We apply the predicate to the named set containing 
members at the Country level. We use the generate function to 
obtain the descendants at the state level only of the members of 
the previously created named set, Notice that we explicitly 
compute the s U s cs ,i j i < < ni 

even for matching aggregate rules, we cannot always use 
precomputed aggregates at the backend data source. 

5.1.1 MDX Code Generation Algorithm for SELECT- 
FROM- WHERE-GROUP BY Queries 
The Input: SQL Query 

Output: MDX query and corresponding Protocol List 

1 . Examine SELECT list of query. If multiple levels from 
a dimension are present, create the appropriate 
calculated Ancestor members. 

2. Convert WHERE clause to CNF. If a conjunct can be 
associated with a single level of a single dimension, add 
it to the lis! of conjuncts marked for early evaluation. 

3. For every dim >n, create a 1 J t 1 i 
projected level. If any filters marked for early 
evaluation are applicable, apply them in the definition 
of She named set. If named sets are created tor multiple 
levels of a dimension, the latest named set is sourced 
from previously created named sets. 

4. Create a named set ([Q]) corresponding to the set 
obtained by crossjoining the lowest projected named set 
for each dimen on A .pp cable cross dimensional filters 
are applied here. 



5. For each aggrcg.i in th t list wh h does not 
have a match ig> »ate rule create a new calculated 
xe liber wh regau i j> i ee! s 

6. Output all an e i I a! ated members 
on columns, named set [Q] on rows. 

7. IfHAVTNN i ( v ant, apply predicates on [Q] 

Other query templates such as SELECT-FROM- WHERE queries 
will have a code generation algorithm which is similar in spirit. 

6. CONCLUSIONS AND FUTURE WORK 

In this paper we described how !o model and query 
i i 1 i ionai data -.ourtes within Siebe! Analytics. This is 
achieved by modeling multidimensional data sources as 
cubetables - relational tables augmented with hierarchy and level 
metadata in the physical layer. We showed examples of various 

*' jl r k s Mis f \i I"*, , j , j IflDiiCC 

issues with different alternatives. We examined how various 
classes of WHERE clause predicates are supported in MDX. 
Given that (he result of a MDX query is a multidimensional data 
set, we described how this output is mapped io a relational rowset. 
This ahiliry 1 1 i i ,i sural data in 

a single integrated framework fulfills a criiicai business need. 
We are pursuing several areas of future work. There is 
considerable variance in MDX support from different vendors. 
Restricted MDX support implies modifying our MDX code 
generation algorithms and post processing protocols. Overall 
query response time is best when we minimize the amount of data 
fetched to the middle tier. We aim to support as wide a template 
of SQL queries as possible so that the resultant MDX results in a 
minimal set of data being fetched. Cubetables currently model 
only balanced hierarchies. Modeling parent child hierarchies 
needs new metadata and new MDX code generation algorithms. 
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